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Abstract — Multidimensional scaling (MDS) is a class of 
projective algorithms traditionally used to produce two- 
or three-dimensional visualizations of datasets consisting 
of multidimensional objects or interobject distances. Re- 
cently, metric MDS has been appUed to the problems 
of graph embedding for the purpose of approximate 
encoding of edge or path costs using node coordinates 
in metric space. Several authors have also pointed out 
that for data with an inherent hierarchical structure, 
hyperbolic target space may be a more suitable choice for 
accurate embedding than Euclidean space. In this paper 
we present the theory and the implementation details of 
MDS-PD, a metric MDS algorithm designed specifically 
for the Poincare disk model of the hyperbolic plane. Our 
construction is based on an approximate hyperboUc line 
search and exemplifies some of the particulars that need to 
be addressed when applying iterative optimization methods 
in a hyperbolic space model. MDS-PD can be used both 
as a visualization tool and as an embedding algorithm. We 
provide several examples to illustrate the utility of MDS- 
PD. 

Index Terms — dimensionality reduction, hyperbolic em- 
bedding, hyperboUc MDS, network graph, steepest descent, 
visuaUzation 

I. INTRODUCTION 



Metric multidimensional scaling (MDS) ([Carroll and 



|Arabie"1980'; De Leeuw and He iser||T982l |Cox and Cox 
[2000; Borg and Groenen^ 2005) is a class of algorithms 
that take as input some or all of the inter-object distances 
(pair dissimilarities) for n objects and produce as output 
a point configuration of n points specified by their 
coordinates in a chosen J-dimensional target space. The 
goal is to return the point configuration whose inter- 
point distances in the J-dimensional space match as 
closely as possible the original input distances. Usually, 
this goal is pursued by minimizing a scalar badness- 
of-fit objective function defined for an arbitrary ?i-point 
configuration in the target space; ideally, the output of an 
MDS algorithm should be the configuration that achieves 
the global minimum of the objective function. 



If the target space dimension is 2 or 3, the output con- 
figuration can be graphically represented, which makes 
MDS a visualization tool seeking to preserve the input 
distances as faithfully as possible, thus clustering the 
objects in the target space by similarity. More generally, 
for a given dimension d, metric multidimensional scaling 
can be used to embed an input set of dissimilarities of 
the original objects into a J-dimensional metric space. 

In order to apply MDS, several design decisions must 
be made. One first needs to choose a target metric space 
of appropriate dimension d and a corresponding distance 
function. An objective function should be chosen so 
that it provides a suitable measure of inaccuracy for a 
given embedding application. If the objective function 
is nonlinear but satisfies some mild general conditions 
(smoothness), a numerical optimization method can be 



chosen (e.g. Nocedal and Wright 19991 for the imple- 
mentation of the embedding algorithm. 

The Euclidean plane is the most common choice 
of target space for visualization applications due to 
its simplicity and intuitiveness. Spherical surface can 
be used, for example, to avoid the edge effect of a 



planar representation (Cox and Cox 1991) . In general, 
MDS on curved subspaces of Euclidean space can be 
viewed as MDS in a higher dimensional Euclidean space 



constrained to a particular surface ( [Bentler and Weeks 
[T978; ,Bloxom[[T978l ). 

The use of metric MDS in the hyperbolic plane was 
proposed in the context of interactive visualization by 
Walter ( |2004 ), inspired by the focus and context hyper- 



bolic tree viewer of |Lamping and Rao[ ( [1994 ). The use of 



MDS in (Walter 2004) inherently generalized the appli- 
cability of the method from tree structures to continuous- 
valued multidimensional data or pair distances. It was 
noted that the curious property of exponential growth 
of the "available space" in the hyperbolic models as 
one moves towards infinity, makes planar hyperbolic 
embedding a suitable choice for both hierarchical and 
high-dimensional data. The adequacy of the hyperbolic 



2 



spaces for embedding of various data was also studied 
and confirmed in the contexts of network embedding for 
path cost estimation (Shavitt and Tankel"2008| and rout- 



ing (,Kleinberg,,2007, ^Krioukov et al._2009; jCvetkovski 


and Crovella 


2009; Papadopoulos et al. 


20101. 



A least squares formulation of MDS, to be used 
in conjunction with an iterative numerical method for 
unconstrained optimization was proposed by [Sammon 



(1969 1. The objective function therein (the Sammon 
stress criterion) was defined as a normalized sum of the 
squared differences between the original dissimilarities 
and the embedded distances of the final point configu- 
ration. To minimize this function, Sammon proposed a 
descent method with step components calculated using 
the first two component derivatives of the objective func- 
tion. Walter (2004 ) adopted Sammon's badness-of-fit 
measure for hyperbolic MDS but observed that applying 
Sammon's iterative procedure in the Poincare disk (PD) 
using exact derivatives is difficult due to the complicated 
symbolic expressions of the second derivative of the 
hyperbolic distance function in this model. Subsequently, 
the Levenberg-Marquardt least squares method was ap- 
plied in (Walter 2004[ ), using only first-order derivatives 
for the optimization, but the details of applying this 
iterative method in the Poincare disk were not elaborated. 

In this paper we present MDS-PD, a metric multidi- 
mensional scaling algorithm using the Poincare disc (PD) 
model. MDS-PD is based on a steepest decent method 
with line search. We show the details of the steepest 
descent along hyperbolic lines in the PD and present a 
suitable approximate hyperbolic line search procedure. 
MDS-PD is applicable in its own right; additionally, its 
construction also illustrates some of the specifics that 
need to be considered when transferring more sophis- 
ticated iterative optimization methods to the PD or to 
other hyperbolic models. Based on this development, we 
show the particulars of a numerical implementation of 
MDS-PD and use this algorithm to carry out numerical 
experiments on several datasets using least squares cri- 
terion functions. Our simulation results indicate that the 
performance of a steepest descent method for minimizing 
a least squares objective on large configurations in the 
PD is notably dependent on the line search method used, 
and that binary hyperbolic line search provides markedly 
better convergence and cost properties for MDS-PD 
compared to more sophisticated or precise methods. 

The rest of this paper is organized as follows. Section 
[njconsolidates the notation and concepts from hyperbolic 
geometry that will be used throughout, and proceeds to 



develop two of the building blocks of MDS-PD - steep- 
est descent in the PD and a corresponding hyperbolic 



line search. Section III considers particular objective 
functions and gradients and further discusses properties 
and applicability of multidimensional scaling in the PD. 



Section IV presents illustrative examples of MDS-PD 
operation in the context of synthetic as well as real-world 
input data. Related work is discussed in Section [V] and 



concluding remarks are given in Section VI 



II. A DESCENT METHOD FOR THE POINCARE 

DISK 

We start this section by introducing our notational 
conventions and establishing some properties of the 
Poincare disk that will be useful in the subsequent devel- 
opment. This will allow us to formally define a Poincare- 
disk specific descent method and a binary hyperbolic 
line search algorithm, that together make a simple, yet 
efficient iterative minimization method for this model of 
the hyperbolic plane. 

A. Preliminaries 

The Poincare Disk model of the hyperbolic plane is 
convenient for our considerations since it has circular 
symmetry and a closed form of the inter-point distance 
formula exists ( Anderson^^2007) . We will be using com- 
plex rectangular coordinates to represent the points of 
the hyperbolic plane, making the PD model a subset of 
the complex plane C: 



{zgC I |z| < 1}. 



(1) 



The hyperbolic distance between two points Zj and Zk 
in D is given by 

do {zj,Zk)= 2atanh f-^ ^ , (2) 

\i-ZjZk\ 

where z denotes the complex conjugate. 

Mobius transformations are a class of transformations 
of the complex plane that preserve generalized circles. 
The special Mobius transformations that take D to D and 
preserve the hyperbolic distance have the form 

az + b 



T{z) 



a,b £ 



^ 0. (3) 



bz + a 

Given a point zo ^ D and a direction 7 G C with 
1 7I = 1 , we can travel a hyperbolic distance s >0 along 
a hyperbolic line starting from zo in the direction 7, 
arriving at the point Zq- 
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Lemma 1. For zo S 

the point 



7 G C with 



1, and s >0, 



7tanh| +Z0 
zoytanh f + 1 

( i) belongs to the hyperbolic ray passing through zq and 
having direction y at zo, and 

(ii) do{zQ,z.o) =s. 

Proof: Given a point zo S D and a direction 7 G C 
with I7I = 1, the hyperbolic ray in D passing through 
Zo and having direction 7 at zo can be parametrized by 
re [0,1) as 

rV-i- 7n 

(4) 



rf . rj+ZQ 



Noting that (|4]), seen as a function of z = rj: 

X Z + ZO 
T U) = ^—r 

zzo + l 

is a Mobius transformation taking D to D and preserving 
hyperboUc distances, we see that 

1 + r 



do if (r) ,Zo) = do (0, r) = In 



1 



whence it follows that moving zo along a hyper- 
bolic line in the direction 7 by a hyperbolic distance 
s = ln((l + r)/(l — r)) we arrive at the point Zq = 
/(tanhf). 

■ 

Next, we introduce some of the notation that will be 
used throughout. 

• Let the point configuration at iteration t = 1,2, . . .T 
consist of n points in the Poincare disk D 

Zj{t), i=l...n 
represented by their rectangular coordinates: 

Zj (0 = yjs (0 + iyj,i (t) , / = V^, yj^i ,yj,2eR 

with \zjit) \ < 1. 

• We also use vector notation to refer to the point 
configuration 

1 T 



Z{t) = [ Zl{t) Z2(0 ... Zn{t) ^ 

= [^1,1(0 yi.lit) ... yn,i{t) 
i[ yi,2{t) 3^2,2(0 ... yn.lit) Y 



where [.] in this work indicates the real matrix 
transpose (to be distinguished from the complex 
conjugate transpose.) 

The distance matrix for a given point configuration 
z is the real valued symmetric matrix D (z) = 



['^i'^lnxn ^^'^^^ entry dj^ is the hyperbolic distance 
between points zj and Zk in the configuration z: 

djk = do{zj,Zk)- 

The dissimilarity matrix A = is a symmet- 

ric, real- valued matrix containing the desired inter- 
point distances of the final output configuration (the 
dissimilarities). The diagonal elements are 5jj = 
and all other entries are positive real numbers: 
5jk = 8kj>0 for j^k. 

The indicator matrix I = ^ symmetric 

0-1 matrix, used to allow for missing dissimilarity 
values. The entries of I corresponding to missing 
values in A are set to 0. All other entries are set to 
1. 

The weight matrix W = [wyi]^^^ is a symmetric, 
real-valued matrix introduced to enable weighting 
of the error terms for individual pairs of points in 
the objective function sum. For convenience, wjk 
corresponding to missing dissimilarities are set to 
some finite value, e.g. 1. 

The objective function to be minimized is the 
embedding error function Et =£?(z,A,W,I) that, 
given the sets of dissimilarities and weights, as- 
sociates to a configuration z an embedding error 
Ef. An example of an error function is the sum of 
relative squared differences 

djk{t)-5jk^ ^ 



5i, 



£,(z,A,W,I) = ^ ^ wjkljk 
j=ik=j+i 

(5) 

The objective function can optionally be normaUzed 
per pair by dividing with the number of summands 
(n^-n) /2. 



B. Descent in the Poincare disk 

Given a configuration of points z, matrices A, W, 
and I, the distance function do{zj,Zk), and an objective 
function £(z, A,W,I), define 



g = V£ 



def 



dE I I dE 
dE I ■ dE 



dE I ^ dE 
dy„,i dy„2 





gl 




82 







(6) 



According to Lemma [T] moving the points zi, ■ ■ ■ ,Zn 

of 

the configuration z along distance realizing paths in the 
PD defined respectively by the directions —gi —gn 
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Figure 1. An example of moving a 4-point configuration in a given 
(descent) direction along distance realizing paths of the Poincare disk 



at z (Fig. [T]) will result in configuration z' with points 

where r > is the step-size parameter which determines 
the hyperbolic distances Sj traveled by zf 

,j - ■■■ 



In 



1 



\Sj\ 



(8) 



The PD model (|]JI implies the constraints \zi\ < 1 for 
the point coordinates. Still, the optimization on the PD 
can be viewed as unconstrained by observing that the 
constraints z'j < 1 will not be violated while moving a 
configuration z in D if the distances sj traveled by each 
point are always kept finite, i.e. 



maXjSj < oo. 



Since (jojl, according to (|8|, corresponds to rmax^- |gj 
1, we have the constraint on r 



(9) 

I < 



r < 



|g|| 



When implementing iterative descent minimization 
methods with line search in the Poincare disk, it is 
important to specify a hyperbolic distance window sm 
along the descent lines where the next configuration will 
be sought. In this case the corresponding value of the 
parameter r is 

' ' (10) 



rM 



■ tanh — < 
2 



g 2 g 

Since the Poincare disk model is conformal, following 
the direction — g (the opposite of ([6])) corresponds to the 
steepest descent optimization method. Moving the point 
configuration along hyperbolic lines (distance realizing 



paths), on the other hand, ensures that the steepest 
descent direction is exhausted most efficiently given the 
current information about the objective function. 

C. A Steepest Descent Algorithm for the PD 
Figure [2] shows a framework for MDS-PD. 

Algorithm MDS-PD 

Input data: 

an initial configuration z ( 1 ) 

the dissimilarities A, weights W, indicators I 
Input parameters: 

an objective function £(z, A,W,I) 

the stopping tolerances Ee, £ae, £g, £r, Tm 
Output: 

a final point configuration z (J) 

a final embedding error Ej 

Initialize: 

t^\; SM^IO; £"_! ^oo; z^z(l); {|2]l} 

Loop: 



or l|g| 



£^£(z,A,W,I); g^V£(z,A,W,I); {2 

rM^ j^-t^ahf; p 

Break if 

E<eE 
or £<£a£ 



{2 



2} 
3} 



4} 



or rM < £r 
or t > Tm; 

E-i ^E; 

r ^HypLineSearch(£'(z, A, W, I) , -g, 0, tm) ; . {05 } 

t -^t + l; 

Return z(r) ^z and Et ^£(z, A,W,I). 



6} 



Figure 2. MDS-PD 

The input data of MDS-PD consists of the initial con- 
figuration z(l), and the input metric: the dissimilarities 
A with the associated weights W and the indicators of 
missing dissimilarities I. The input parameters are the 
objective error function £(z,A,W,I) and the stopping 
tolerances Ee, £ae, %> £r> and Tm- The output of MDS- 
PD consists of the final point configuration z{T) and its 
associated embedding error Ej = E {z [T] , A, W,I). 

The initialization {[2] 1 } sets the maximum hyperbolic 
distance sm that can be traveled by any point of the 
configuration, and the previous value of the embedding 
error i . 

Each iteration starts by determining the gradient of 
the error in the current configuration {|2]2} and the 



corresponding window {[2] 3} for the parameter r 
(Eq. (flOb). A hyperboUc line search (described in Sec. 



II-Di is performed {|2]5} in the direction of the steepest 
descent — g of the embedding error and the resulting step- 
size parameter r is used in {[2] 6} to arrive at the next 
configuration as in ([7]). 

Several stopping criteria are used (line {|2]4}) to 
terminate the search. Ideally, the algorithm exits when 
the embedding error is close to (£ < Se)- Termination 
also occurs in the cases when the error decreases too 
slowly (£_i —E< Eae), or when the gradient or the step- 
ping parameter become too small (||g||o<, < £g, < £,)■ 
Finally, Tm, the maximum allowed number of iterations, 
is used as a guard against infinite looping. 

The line search subprogram used in {[2]5} is described 
next. 

D. Approximate Hyperbolic Line Search 

An exact line search could be used in line {|2]5} (Fig. 
[2]) to determine a value for the step size r such that the 
corresponding new configuration {[2] 6} achieves a local 
minimum of the embedding error along the search path 
with tight tolerance: 

r^argmin.gjo.rMl^W' (H) 

where q (r) is the embedding error as a function of r. 

However, increasing the precision of this computation 
is not essential to the convergence performance since the 
steepest descent search direction is only locally optimal. 
Furthermore, exact line search can fail to converge to 
a local minimum even for a second degree polynomial 
due to finite machine precision ( Frandsen et al.||2004[ ). It 
is now accepted in the numerical optimization literature 
that approximate line search provides convergence rates 
comparable to the exact line search while significantly 
reducing the computational cost per line search. In fact. 



the step calculation used in Sammon (1969 1 is a "zero- 
iteration" approximate line search, where the step size is 
simply guessed based on the first two derivatives of the 
error. Conceivably, the simplest inexact step calculation 
would guess the step size based only on the directional 
gradient at the current configuration. 

Approximate line search procedures aim to reduce the 
computational cost of determining the step parameter by 
posing weaker conditions on the found solution: Rather 
than searching for a local or global minimizer of q{r) 
on (0,rM], a value is returned by the line search function 
as satisfactory if it provides sufficient decrease of the 
objective function and sufficient progress toward the 




Figure 3. Acceptable step lengths for inexact line search obtained 
from the sufficient decrease condition. 



solution configuration. A popular approach to defining 
sufficient decrease is to define the "roof" function 

X{r)=q{0)+p-q' {0)-r, 0<p<\ (12) 

which is a line passing through (0,^(0)) and having a 
slope which is a fraction of the slope of q{r) at r = 0. 
With this function, we define that sufficient decrease is 
provided by all values of r such that 



q{r)<X{r), r£ (0,rM] 



(13) 



Fig. [3] shows an example of acceptable step length 
segments obtained from the sufficient decrease condition 
([13). 

To ensure sufficient progress, we adopt a binary 
search algorithm motivated by the simple backtracking 
approach (e.g. |Nocedal and Wright] (1999) ). The details 
are given in Fig. [4] 

We start the line search with an initial guess ro for 
the step size parameter, and in the expansion phase 
{|4]l} we double it until it violates the window rM 
or the sufficient decrease condition. In the reduction 
phase {|4]2}, we halve r until it finally satisfies both the 
window requirement r < rM and the decrease criterion 
q{r)<X{r). 

We observe that, when started at a point with nonzero 
gradient, the line search will always return a nonzero 
value for r. Since the returned acceptable step r is such 
that the step 2 • r is not acceptable, there will be a 
maximum acceptable point r,„ from the same acceptable 
segment as r, such that r <r,„ <2- r, whence r>rm/2. 
In other words, the returned value is always in the upper 
half of the interval [0, r,„] and we accept this as sufficient 
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Procedure HypLineSearch 



Input data: 

an initial guess of the step parameter ro 

the maximum step value tm 

the function q (r) 
Input parameters: 

the slope parameter p for the roof function A (r) ; 
Output: 

an acceptable step parameter r 

Initialize: 

r ^ ro; 

While r < fm and q{r) < X (r) , 

r^2-r; 

While r < tm or q (r) > X (r) , 

r^r/2; ^2} 

Return r. 



{01} 



Figure 4. Line search procedure for MDS-PD 



progress toward the solution, thus eliminating some more 
computationally demanding progress criteria that would 
require calculation of q'{r) at points other than r = 



or cannot always return a nonzero r (cf. |Nocedal and 
Wright||T999| [Frandsen et aL|[2004l ). 



It remains to show how to calculate the slope of A (r), 
that is pq' (0) (Eq. [12]). Given a configuration z and a 
direction — g = ~VE (z, A,W,I), the configuration z' as 
a function of r Q can be conveniently represented as a 
column-vector function 

M(-rg,z) (14) 

whose 7'-th entry is the Mobius transform 

-rgjZ]+l 

The associated embedding error as a function of r is then 
^(r)=£(M(-rg,z),A,W,I), (15) 
and it can be easily shown that its slope is given by 

= {ReM' {-rg,z)f ReVE {M{~rg,z) ,A,W,I) 
+ {imM' (-rg, z)) ^ Im VE (M (-rg, z) , A, W, I) 



where the entries of M'(— rg,z) are given by 

|2 



Mj{r) = j-Mj{r)=gj- 



Zi 



1 



\2 ■ 



We thus have a general explicit formula for calculating 
q' (r) given a configuration z and the corresponding 
gradient g of £ at z. In particular, this formula can be 
used to calculate pq' (0), the slope of A (r). 

III. MULTIDIMENSIONAL SCALING IN THE 

PD 

A. Objective Functions and Gradients 

The iterative minimization method presented in Sec. 
[n] requires a choice of an embedding error function with 
continuous first derivatives. In this work we consider the 
least squares error function 



E = cY, E Cjk{djk 
j=ik=j+i 



a5jk) . 



(16) 



We note that (16 1 is a general form from which several 



special embedding error functions can be obtained by 
substituting appropriate values of the constants c, cjk, 
and a. Examples include: 

• Absolute Differences Squared (ADS) 



^=£ L Wjk[ljk{djk-adjk)Y 

Relative Differences Squared (RDS) 

djk —adjk 



E E 



Ijk- 



a5 



(17) 



(18) 



Sammon Stress Criterion (SAM) 
1 



E E 

«E E ^j^Sj^ 



w 



jk- 



{Ijk{djk-adjk)y 



(19) 

As the most general case of (16), individual importance 
dependent on the input dissimilarities can be assigned to 
the pairwise error terms using the the weights terms wji^. 

MDS-PD also requires calculation of the gradient of 
the error function. For a general error function, closed 
form symbolic derivatives may or may not exist, and 
in the latter case one can resort to approximating the 
gradient using finite difference calculations. Numerical 
approximation may also incur lower computational cost 
than the formal derivatives. However, the use of nu- 
merical derivatives can introduce additional convergence 
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problems due to limited machine precision. 



in the optimization of the embedding error - the dis- 



For the sum (16 1, both approaches are possible. A 
symbolic derivation of the gradient of ( [T6] ), including 
both the Euclidean and hyperbolic cases, can be easily 
carried out and is omitted here for brevity. From the 
obtained result, symbolic derivatives of ([T7])-( 19 1, as 



well as any other special cases derivable from (16 1 can 
be obtained by substituting appropriate constants. 

B. Local vs. Global Minima 

MDS-PD, being a steepest descent method that termi- 
nates at near-zero progress, can find a stationary point 
of the objective function. In the least squares case, if the 
value at the returned solution is close to zero (that is, 
E < Ee), then the final configuration can be considered 
a global minimizer that embeds the input metric with 
no error. In all other cases, a single run of MDS- 
PD cannot distinguish between local and global points 
of minimum or between a minimizer and a stationary 
point. A traditional way of getting closer to the global 
minimum in MDS is to run the minimization multiple 
times with different starting configurations. Expectedly, 
there will be accumulation of the results at several 
values, and the more values are accumulated at the lowest 
accumulation point, the better the confidence that the 
minimal value represents a global minimum i.e. the least 
achievable embedding error. 

C. Input Data Curvature Matching 

The objective functions used in metric Euclidean 
MDS are typically constructed to be scale-invariant in 
the sense that scaling the input dissimilarities and the 
coordinates of the output configuration with the same 
constant factor a does not change the embedding error. 
This is possible for Euclidean space since the Euclidean 
distance function scales by the same constant factor as 
the point coordinates : {Y,s= i ' J j.s ~ ^ ' J/ti ) ^ ) ^ = ^ ■ djk ■ 
Thus, for example, if dj/^ is the Euclidean distance, then 



the sums (fTSII and (fT9|l are scale-invariant, whereas (17i 



is not. However, when dji^ is the hyperbolic distance 
function (|2]l, none of the (fT7])-([T9l) is scale-invariant. 
Therefore, the simplest ADS error function ( [TT] ) may be 
a preferable choice for reducing the computational cost 
in the hyperbolic case. Nevertheless, for our numerical 
experiments we choose to apply the Sammon criterion 



(19 1 so as to facilitate numerical comparison between 



the final embedding errors for the Sammon map in the 
EucUdean plane and MDS-PD. 

The lack of scale-invariance of the hyperbolic distance 
formula (|2]) implies an additional degree of freedom 



similarity scaling factor. In Eqs. (16l-(19l this extra 
degree of freedom is captured via the parameter a that 
scales the original entries of the dissimilarity matrix. The 
dependency of the embedding error of MDS-PD on the 
dissimilarity scaling factor a varies with the type of input 



data and is investigated in more detail in Section IV 



IV. MDS-PD: EXPERIMENTAL RESULTS 

Following the specification in the previous sections, 
we successfully implemented MDS-PD with the error 
functions ([T7])-([T9]). In this section we show a few 
illustrative results of our experimental study using MDS- 
PD on synthetic as well as real-world data. Some of 
the methods we used to verify the correctness of our 
specification are also discussed below. 

A. An Illustrative Example 

As a first example, we used a random configuration 
of seven points in the Poincare disk. We populated the 
input dissimilarity matrix with the hyperbolic inter-point 
distances and started MDS-PD from another randomly- 
generated seven point initial configuration in the PD. Fig. 
[5] shows the trajectories traveled by the points during 
the minimization. The clear points denote the initial 
configuration and the solid points represent the final 
point configuration. Fig. |6] shows the MDS-PD internal 
parameters vs. the iteration number for this example: In 
Fig. |6^, the embedding error E monotonically decreases 
with every iteration; the iterations terminated with the 
E < Ee = 10^^ condition, which means that likely the 
output configuration represents the global minimum and 
the final inter-point distances match the input dissim- 
ilarities very closely. The step-size parameter r was 
initialized with a value of 1 and assumed only values 
of the form 2^, for integral k (Fig. The exponential 
character of the change of r according as {|4]1} and 
{|4]2} (Fig. |4]l ensures the low computational cost of 
the line search subprogram and in our numerical studies 
proved superior to other line-search strategies, including 
exact search or adaptive approximate step-size parameter. 
The refining of the step size as the current configuration 
approaches a local minimum of the error function, on the 
other hand, is achieved by the decrease of the gradient 
norm. This is further illustrated in Figs. ^ and |6]l. 

B. Scaling of Synthetic Data 

To investigate the dependency of the embedding error 
on the dissimilarity scaling factor a, we used as input the 
inter-point distances obtained from random sets of points 
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0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 



Figure 5. The minimization trajectory for a seven point configura- 
tion using MDS-PD. The clear and the solid points are respectively 
the initial and the final point configuration. 



(a) Embedding error E 



(b) Step-size parameter r 





(c) Norm of gradient ||g|| 



20 30 
iterations 



(d) Rei. step-size parameter r/r 




20 40 
Iterations 




Figure 6. The MDS-PD internal parameters vs. the iteration number 
for the seven point example of Fig. |5] (a) the embedding eiTor E, (b) 
the step-size parameter r, (c) the norm of the gradient ||g||c„, and (d) 
the step-size parameter relative to the maximum allowed value r/rM- 



residing on surfaces of constant positive, zero or negative 
curvature (i.e. respectively a sphere, a Euclidean plane 
and the Poincare disk model of the hyperbolic plane.) 
The corresponding distances (spherical, Euclidean and 
hyperbolic) for all pairs were used as dissimilarities 
in this experiment. The embedding error function was 



the Sammon criterion (19 1. We also used noisy inputs 



obtained by replacing each original dissimilarity djk 



with a random number uniformly distributed in the 
interval [{I — e,„)5jk, {I + em)Sjk\ for a chosen noise 
level e„, < 1. 

Fig. [7] shows the typical effects of dissimilarity scaling 
for several Euclidean, spherical, and hyperbolic graphs. 
Cases (a), (c), and (e) illustrate the variation of the em- 
bedding error for noiseless input data, with the number 
of points as a parameter (20 and 60 points.) Cases (b), 
(d), and (f) illustrate the variation of the embedding error 
for noisy input data and are parametrized by the amount 
of measurement noise (e,„ = 0,10,20,30%.) Each point 
in the diagrams (a)-(f) was obtained as a minimum 
Sammon stress in a series of 70 replicates of MDS-PD 
for different randomly chosen initial configuration in the 
PD. The smoothness of the obtained curves demonstrates 
that for the chosen problems, this number of replicates 
was enough to approach the global minimum achievable 
embedding error for each simulated value of a. The 
results are drawn on semilogarithmic axes in order to 
show more details toward small a values. 

Locally, the Poincare disk model, distance-wise "looks 
like" the Euclidean plane scaled with some constant 
factor. For example, in the vicinity of a point zo, the 
hyperbolic distance formula (j2j) becomes do{zj,Zk) ~ 
\zj — Zk\ •2/(1 — |zo|^)- Therefore, for a sufficiently small 
scaling factor a and sufficiently many replicates, metric 
MDS implementations for the PD model and for the 
Euclidean plane using the same scale-invariant (for Eu- 
clidean distances) error function, should return approx- 
imately equal embedding errors for the final configura- 
tions. (A sufficiently small value of a is one that would 
make the final configuration land in a sufficiently small 
neighborhood of a point in the PD.) In this sense, MDS- 
PD is a generalization of an Euclidean MDS algorithm. 
We used these observations to verify that our MDS- 
PD implementation returned the expected error values 
for small scaling factors. Indeed, as Fig. [7] shows, for 
small a values, the Euclidean graphs were embeddable 
with no error, and the other two graph types had stress 
that numerically matched the output of other available 
Euclidean MDS implementations using the Sammon 
stress criterion. 

In the cases (e) and (f), the original configurations 
are residing on a hyperbolic plane, and therefore are 
embeddable with zero stress in the PD model for some 
value of a (a = 1 in this synthetic example). For this 
value, our implementation of MDS-PD was able to 
find the original configuration up to hyperbolic -distance 
preserving Mobius transformations. The existence of 
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Figure 7. Effects of dissimilarity scaling for Euclidean, spherical, 
and hyperbolic graphs. Figures (a), (c), and (e) show the variation 
of the embedding error for noiseless input data, with the number of 
nodes as a parameter. Figures (b), (d), and (f) show the variation of 
the embedding error for noisy input data parametrized by the amount 
of measurement noise. 



graphs with a hyperbolic "underlying structure" that are 
embeddable with notably lower stress in 2-dimensional 
hyperbolic than in 2-dimensional Euclidean space is an 
important illustration of the usefulness of MDS in the 
Poincare disk. Candidate graphs having such hyperbolic- 
like structure are real-world communication or social 
networks that tend to have a strongly interconnected core 
and a sparser periphery of tendrils. MDS-PD can be 
used in such contexts to investigate the "hyperbolicity" 
of the input data and arrive at lower stress dissimi- 
larity embedding. The diagrams (b), (d), and (f) (Fig. 
[7]) also demonstrate that relatively high noise levels 
in the measured data do not significantly change the 
suitability for embedding in the PD in the cases when 
the original dissimilarity matrix has a natural underlying 
2-dimensional space. 

C. Real-World Graph Examples 

To further demonstrate the ability of the Poincare disk 
to accommodate lower stress 2-dimensional embeddings 
than classical Euclidean MDS for certain graph types, we 
applied MDS-PD to dissimilarity matrices obtained from 
several real-world datasets. In this section we summarize 
the results. 

I) The Iris Dataset: As a first experiment, we ap- 



plied MDS-PD to the Iris dataset ( |Anderson| 1 1935 1. 
This classical dataset consists of 150 4-dimensional 
points from which we extracted the Euclidean inter- 
point distances and used them as input dissimilarities. 
The embedding error as a function of the scaling factor 
a is shown in Fig. [8] Each value in the diagram is 
obtained as a minimum embedding error in a series 
of 100 replicates starting from randomly chosen initial 
configurations. Minimal embedding error overall was 
achieved for a ^ A. The improvement with respect to 
the 2-dimensional Euclidean case was 10%. The Iris 
dataset is an example of dimensionality reduction of 
an original higher-dimensional dataset that can be done 
more successfully using the PD model. 

2) Political Books: An interesting network was pre- 
sented by Krebs] ( 2008| ), who assembled a connectivity 
graph of political books frequently bought together dur- 
ing an election campaign. In the graph version we used, 
there were 105 nodes representing books and a total 
of 441 undirected, unweighted links between books that 
were frequently bought together. We obtained dissimilar- 
ities by assigning self-dissimilarity of 0, dissimilarity of 
1 for co-purchased books and a missing (unknown) dis- 
similarity for the remaining pairs and applied MDS-PD 
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Figure 8. The effect of scaling of the dissimilarities on the 
embedding error for the Iris Dataset i Anderson| |1935| >. The input 
dissimilarities are the Euclidean distances between pairs of original 
points. This MDS-PD result reveals that the Iris dataset is better suited 
for embedding to the hyperbolic plane that to the Euclidean plane. 



Figure 10. The embedding error as a function of the input scaling 
factor for the network of citations (Newman 2001 1. The network is 
an example of weighted graph data that can be encoded with lower 
embedding error in the PD model than in the Euclidean plane. The 
input dissimilarities are capturing the strength of the collaboration 
between pairs of authors. 
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3) Citation Network: The citation network that we 
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Figure 9. The embedding error as a function of the input scaling 
factor for the network of political books (Krebs 2008). The input 
dissimilarities are simply indicators of the presence or absence of 
links between the network nodes. The political books network is an 
example of unweighted, undirected real-world graph data that can be 
embedded with lower error in the PD model than in the Euclidean 
plane. 



to the resulting dissimilarity matrix. We also conducted 
the experiment using only the liberal and the conser- 
vative subgraphs (43 and 49 points respectively). The 
obtained minimum embedding error of 150 replicates 
as a function of the scaling factor a is shown in Fig. 
[9] We note that there were remarkable gains of using 
the PD model instead of the Euclidean plane. For the 
overall graph, the minimal stress was 7.6 times smaller 
than the Euclidean stress. The liberal and conservative 
components alone achieved improvement of 8.8 and 9 
times with respect to the Euclidean case. 



used in this experiment was compiled by Newman (2001 



from bibliographies of review articles on networking. 
We extracted the largest connected component from the 
graph which consisted of 379 nodes representing authors. 
There were 244 edges in the graph with weights sjj^ 
representing the strength of the collaborative ties. We 
have calculated dissimilarities from these data using 
5jk = const — Sjk and applied the MDS-PD algorithm. 
The obtained minimum embedding error of 50 replicates 



as a function of the scaling factor a is shown in Fig. 10 



The overall minimum embedding error was 2.63 times 
lower than the stress obtained using Euclidean MDS. 

V. RELATED WORK 

A multidimensional scaling algorithm for fitting dis- 
tances to constant-curvature Riemannian spaces was 
given by Lindman and CaeTIi| fl978 ). This work uses 
the hyperboloid model of the hyperbolic space which re- 
quires an « + 1 -dimensional Euclidean space to represent 
an n-dimensional hyperbolic space, and is less suitable 
for visualization purposes in the case of the hyperbolic 
plane. 

The applicability of metric multidimensional scaling 
in the Poincare disk model of the hyperbolic plane 



was studied by |Walter| ( |2004| ). The study focused on 
the task of embedding higher-dimensional point sets 
into 2-dimensional configurations for the purpose of 
interactive visualization. It was demonstrated that the PD 
has capacity to accommodate lower stress embedding 
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Figure 11. Comparison of the point trajectories: H-MDS of J. Walter 
|Walter|p004l l (left) vs. MDS-PD hyperbolic lines (right) 



than the EucUdean plane. Several important pointers to 
the difficulties one encounters in implementing such 
algorithms were given, but a definite specification or 
implementation was not provided. 

The proposed method to convert the seemingly con- 
strained optimization problem to an unconstrained one 
( Walter|2004 Eq. 12) ensures that the moving configura- 
tion would stay inside the model during the optimization. 
However, this transformation fails to follow the distance 
realizing (hyperbolic) lines, or even Euclidean lines. The 
problem is illustrated in Fig. [TT] The possibility that 
the dissimilarity matrix has missing values was also 
not addressed in this work, as the dissimilarities were 
generated from higher-dimensional points. Inputs derived 
from graph data, however, are typically sparse. 

|Shavitt and Tankel, presented numerical exam- 

ples illustrating that models of the hyperbolic space with 
circular symmetry may be better suited than Euclidean 
spaces for embedding network graphs with core-and- 
tendrils structure. Their work concentrated on applica- 
tions specific to communication networks where dissim- 
ilarities for each pair of points are derived from the 
lengths of the shortest paths in the graph. For such 
applications, the authors' insight about the choice of the 
Poincare disk model was that the shortest paths in the 
studied networks often pass through the core and are 
therefore longer than the straight-line distance, and this 
observation empirically matches the behavior of distance 
function in the chosen model. In order to avoid the 
constrained nature of the coordinates in the PD, the 
authors eventually resorted to the hyperboloid model of 
the hyperbolic plane, omitting the details. 

The "big-bang simulation" (BBS) numerical method 



used in (Shavitt and Tankel 20081, is discussed in 



( Shavitt and Tankel|200 4l). BBS is a variant of a steepest 
descent method that models the point configuration as an 



inertial system in a force-generating field. Termination 
is guaranteed by introducing empirical dampening in 
the mechanical system. The initial configuration in BBS 
is always chosen to be a single point in which all 
particles are collocated, ensuring a fair initial amount 
of potential energy. Another heuristic feature of BBS 
is that the objective function changes several times 
during the minimization in a way that increases the error 
sensitivity of the penalty terms. The particle inertia in 
BBS in conjunction with a stepwise changing objective 
function possibly allows the method to escape a few local 
minima before termination. However, the advantages of 
these heuristics in avoiding local minima, compared to a 
computationally simpler, single phase minimization pro- 
cedure, were not clearly demonstrated. It is conceivable 
that the inertial minimum-avoiding mechanism, which 
comes at an increased computational cost, may as well 
cause the configuration to leave the global minimum, or 
a lower local minimum before stopping in a higher one. 
Finally, since BBS can only be started from one possible 
initial configuration, it has a deterministic outcome once 
the heuristic parameters such as friction and time slice 
are chosen; with this choice, the possibility that the final 
result is improved by restarting from different initial 
conditions, is eliminated. 

Numerous methods that are more likely to find a lower 
minimum than the simplest repeated descent methods in 
a single run have been contemplated in the numerical 
optimization literature. However, to guarantee in general 
that the global minimizer is found is difficult with any 
such method. It may be necessary to resort to running the 
sophisticated methods several times as well in order to 
gain confidence in the final result. Since these methods 
are usually computationally more complex or incorporate 
a larger number of heuristic parameters, the incurred 
computational and implementational costs often offset 
the benefits of their sophistication. 

VI. CONCLUSION 

We developed the details of MDS-PD, an iterative 
minimization method for metric multidimensional scal- 
ing of dissimilarity data in the Poincare disk model of the 
hyperbolic plane. While our exposition concentrated on 
a simple steepest descent minimization with approximate 
binary hyperbolic line search, we believe that elements 
of the presented material will also be useful as a general 
recipe for transferring other, more sophisticated iterative 
methods of unconstrained optimization to various models 
of the hyperbolic space. 
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We believe that the algorithm specification herein 
allows for easy implementation and represents a useful 
novel tool for the scientific community and that its 
application will enable a wide range of scientific inquiry 
spanning beyond the area of numerical optimization 
methods. 

Our initial numerical experiments using both syn- 
thetic and real-world data suggest that MDS-PD achieves 
significantly better results compared to its Euclidean 
counterpart for naturally arising networks of interaction. 
This behavior may ultimately be attributed to the less- 
restricted axiomatic foundation of the hyperbolic geom- 
etry compared to its Euclidean counterpart. Our results 
justify the development effort and encourage future work 
in the directions of generalizing this algorithm to higher- 
dimensional models of the hyperbolic space, improving 
its efficiency, and establishing novel uses and applica- 
tions. 
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